The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
translated by 谷歌翻译
Convolutional Neural Networks (CNNs) have demonstrated superiority in learning patterns, but are sensitive to label noises and may overfit noisy labels during training. The early stopping strategy averts updating CNNs during the early training phase and is widely employed in the presence of noisy labels. Motivated by biological findings that the amplitude spectrum (AS) and phase spectrum (PS) in the frequency domain play different roles in the animal's vision system, we observe that PS, which captures more semantic information, can increase the robustness of DNNs to label noise, more so than AS can. We thus propose early stops at different times for AS and PS by disentangling the features of some layer(s) into AS and PS using Discrete Fourier Transform (DFT) during training. Our proposed Phase-AmplituDe DisentangLed Early Stopping (PADDLES) method is shown to be effective on both synthetic and real-world label-noise datasets. PADDLES outperforms other early stopping methods and obtains state-of-the-art performance.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
营销活动是一系列战略活动,可以促进企业的目标。在真正的工业场景中,营销活动的效果预测非常复杂且具有挑战性,因为通常从观察数据中学到了先验知识,而没有任何营销活动干预。此外,每个主题始终在几个营销活动的干预下同时受到干扰。因此,我们无法轻松解析和评估单个营销活动的效果。据我们所知,目前尚无有效的方法来解决此类问题,即,基于具有多个相互缠绕事件的层次结构对个体级别的预测任务进行建模。在本文中,我们对效果预测任务中涉及的基础解析树的结构进行了深入的分析,并进一步建立了一个层次结构胶囊预测网络(HAPNET)来预测营销活动的影响。基于合成数据和实际数据的广泛结果证明了我们模型比最新方法的优越性,并在实际工业应用中表现出显着的实用性。
translated by 谷歌翻译
当相互作用数据稀缺时,深厚的增强学习(RL)算法遭受了严重的性能下降,这限制了其现实世界的应用。最近,视觉表示学习已被证明是有效的,并且有望提高RL样品效率。这些方法通常依靠对比度学习和数据扩展来训练状态预测的过渡模型,这与在RL中使用模型的方式不同 - 基于价值的计划。因此,学到的模型可能无法与环境保持良好状态并产生一致的价值预测,尤其是当国家过渡不是确定性的情况下。为了解决这个问题,我们提出了一种称为价值一致表示学习(VCR)的新颖方法,以学习与决策直接相关的表示形式。更具体地说,VCR训练一个模型,以预测基于当前的状态(也称为“想象的状态”)和一系列动作。 VCR没有将这个想象中的状态与环境返回的真实状态保持一致,而是在两个状态上应用$ q $ - 价值头,并获得了两个行动值分布。然后将距离计算并最小化以迫使想象的状态产生与真实状态相似的动作值预测。我们为离散和连续的动作空间开发了上述想法的两个实现。我们对Atari 100K和DeepMind Control Suite基准测试进行实验,以验证其提高样品效率的有效性。已经证明,我们的方法实现了无搜索RL算法的新最新性能。
translated by 谷歌翻译
Exposure to ideas in domains outside a scientist's own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved performance in scholarly search engines can help scientists efficiently identify relevant advances in domains they may already be familiar with, it may fall short of helping them explore diverse ideas \textit{outside} such domains. In this paper we explore the design of systems aimed at augmenting the end-user ability in cross-domain exploration with flexible query specification. To this end, we develop an exploratory search system in which end-users can select a portion of text core to their interest from a paper abstract and retrieve papers that have a high similarity to the user-selected core aspect but differ in terms of domains. Furthermore, end-users can `zoom in' to specific domain clusters to retrieve more papers from them and understand nuanced differences within the clusters. Our case studies with scientists uncover opportunities and design implications for systems aimed at facilitating cross-domain exploration and inspiration.
translated by 谷歌翻译
除了像素功能之外,还利用“类级”信息(例如OCR和CPNET)等最新的分割方法,在提高现有网络模块的准确性方面取得了显着的成功。但是,提取的类级信息简单地与像素功能相连,而无需明确利用以获得更好的像素表示学习。此外,这些方法基于粗蒙版预测来学习软类中心,这很容易积累错误。在本文中,旨在更有效地使用班级信息,我们提出了一种普遍的班级感知正规化(CAR)方法,以优化特征学习过程中的阶层内差异和类间距离,这是由于人类可以识别的事实而激发的。对象本身不管它出现哪个其他对象。提出了三个新颖的损失功能。第一个损失函数鼓励每个类中更紧凑的类表示,第二个损失函数直接最大化了不同类中心之间的距离,第三个进一步推动了班级中心和像素之间的距离。此外,我们方法中的班级中心是由地面真理直接产生的,而不是从容易出错的粗糙预测中产生。我们的方法可以轻松地应用于包括OCR和CPNET在内的大多数现有分割模型,并且在没有额外的推理开销的情况下可以在很大程度上提高其准确性。在多个基准数据集上进行的广泛实验和消融研究表明,所提出的汽车可以提高所有基线模型的准确性,高达2.23%MIOU,具有出色的概括能力。完整的代码可在https://github.com/edwardyehuang/car上找到。
translated by 谷歌翻译
许多3D表示(例如,点云)是下面连续3D表面的离散样本。该过程不可避免地介绍了底层的3D形状上的采样变化。在学习3D表示中,应忽略应忽略变化,而应捕获基础3D形状的可转换知识。这成为现有代表学习范式的大挑战。本文在点云上自动编码。标准自动编码范例强制编码器捕获这种采样变体,因为解码器必须重建具有采样变化的原始点云。我们介绍了隐式AutoEncoder(IAE),这是一种简单而有效的方法,通过用隐式解码器替换点云解码器来解决这一挑战。隐式解码器输出与相同模型的不同点云采样之间共享的连续表示。在隐式表示下重建可以优先考虑编码器丢弃采样变体,引入更多空间以学习有用的功能。在一个简单的线性AutoEncoder下,理论上理论地证明这一索赔。此外,隐式解码器提供丰富的空间来为不同的任务设计合适的隐式表示。我们展示了IAE对3D对象和3D场景的各种自我监督学习任务的有用性。实验结果表明,IAE在每项任务中始终如一地优于最先进的。
translated by 谷歌翻译
在这项工作中,我们将该算法考虑到(非线性)回归问题与$ \ ell_0 $罚款。用于$ \ ell_0 $基于$的优化问题的现有算法通常用固定的步长进行,并且选择适当的步长度取决于限制的强凸性和损耗功能的平滑度,因此难以计算计算。在Sprite的支持检测和根查找\ Cite {HJK2020}的思想中,我们提出了一种新颖且有效的数据驱动线搜索规则,以自适应地确定适当的步长。我们证明了绑定到所提出的算法的$ \ ell_2 $ error,而没有限制成本函数。在线性和逻辑回归问题中具有最先进的算法的大量数值比较显示了所提出的算法的稳定性,有效性和优越性。
translated by 谷歌翻译